Finding Statistically Significant Attribute Interactions

نویسندگان

  • Andreas Henelius
  • Antti Ukkonen
  • Kai Puolamäki
چکیده

In many data exploration tasks it is meaningful to identify groups of aŠribute interactions that are speci€c to a variable of interest. For instance, in a dataset where the aŠributes are medical markers and the variable of interest (class variable) is binary indicating presence/absence of disease, we would like to know which medical markers interact with respect to the binary class label. Œese interactions are useful in several practical applications, for example, to gain insight into the structure of the data, in feature selection, and in data anonymisation. We present a novel method, based on statistical signi€cance testing, that can be used to verify if the data set has been created by a given factorised class-conditional joint distribution, where the distribution is parametrised by a partition of its aŠributes. Furthermore, we provide a method, named astrid, for automatically €nding a partition of aŠributes describing the distribution that has generated the data. State-of-the-art classi€ers are utilised to capture the interactions present in the data by systematically breaking aŠribute interactions and observing the e‚ect of this breaking on classi€er performance. We empirically demonstrate the utility of the proposed method with examples using real and synthetic data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Framework for Finding Statistically Significant Differences

In the questionnaire analysis, finding whether there is a statistically significant difference between two or more groups in a continuous measure is one of the major problems in researches. However, it is difficult for researchers to solve the issue of finding possible statistically significant difference, namely “Statistically Significant Difference Unawareness Issue”. There are two causes to ...

متن کامل

Algorithms for Efficient Mining of Statistically Significant Attribute Association Information

Knowledge of the association information between the attributes in a data set provides insight into the underlying structure of the data and explains the relationships (independence, synergy, redundancy) between the attributes and class (if present). Complex models learnt computationally from the data are more interpretable to a human analyst when such interdependencies are known. In this paper...

متن کامل

Association between Tumor Necrosis Factor- α-308 G/A Polymorphism and Multiple Sclerosis: A Systematic Review and Meta-Analysis

Multiple sclerosis (MS) is a complex polygenic disease in which gene-environment interactions are important. A number of studies have investigated the association between tumor necrosis factor-α (TNF-α) -308 G/A polymorphism (substitution G→A, designated as TNF1 and TNF2) and MS susceptibility in different populations, but the results of individual studies have been inconsistent. Therefore, per...

متن کامل

Breast cancer in first-degree relatives and risk of lung cancer: assessment of the existence of gene sex interactions.

BACKGROUND Previous studies have shown the sex differences in lung cancer and the associations between estrogen-related genes and non-small cell lung cancer. In the present study, we assumed the existence of shared candidate genes that are common in lung and breast cancers, and examined whether women with a family history of breast cancer are at increased risk of lung cancer compared with men, ...

متن کامل

Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — In the Case of Critical Dataset Size —

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains be...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1612.07597  شماره 

صفحات  -

تاریخ انتشار 2016